11 research outputs found

    Optimal Deceptive and Reference Policies for Supervisory Control

    Full text link
    The use of deceptive strategies is important for an agent that attempts not to reveal his intentions in an adversarial environment. We consider a setting in which a supervisor provides a reference policy and expects an agent to follow the reference policy and perform a task. The agent may instead follow a different, deceptive policy to achieve a different task. We model the environment and the behavior of the agent with a Markov decision process, represent the tasks of the agent and the supervisor with linear temporal logic formulae, and study the synthesis of optimal deceptive policies for such agents. We also study the synthesis of optimal reference policies that prevents deceptive strategies of the agent and achieves the supervisor's task with high probability. We show that the synthesis of deceptive policies has a convex optimization problem formulation, while the synthesis of reference policies requires solving a nonconvex optimization problem.Comment: 20 page

    Smooth Convex Optimization using Sub-Zeroth-Order Oracles

    Full text link
    We consider the problem of minimizing a smooth, Lipschitz, convex function over a compact, convex set using sub-zeroth-order oracles: an oracle that outputs the sign of the directional derivative for a given point and a given direction, an oracle that compares the function values for a given pair of points, and an oracle that outputs a noisy function value for a given point. We show that the sample complexity of optimization using these oracles is polynomial in the relevant parameters. The optimization algorithm that we provide for the comparator oracle is the first algorithm with a known rate of convergence that is polynomial in the number of dimensions. We also give an algorithm for the noisy-value oracle that incurs a regret of O~(n3.75T0.75)\tilde{\mathcal{O}}(n^{3.75} T^{0.75}) (ignoring the other factors and logarithmic dependencies) where nn is the number of dimensions and TT is the number of queries.Comment: Extended version of the accepted paper in the 35th AAAI Conference on Artificial Intelligence 2021. 19 pages including supplementary materia

    Alternating Direction Method of Multipliers for Decomposable Saddle-Point Problems

    Full text link
    Saddle-point problems appear in various settings including machine learning, zero-sum stochastic games, and regression problems. We consider decomposable saddle-point problems and study an extension of the alternating direction method of multipliers to such saddle-point problems. Instead of solving the original saddle-point problem directly, this algorithm solves smaller saddle-point problems by exploiting the decomposable structure. We show the convergence of this algorithm for convex-concave saddle-point problems under a mild assumption. We also provide a sufficient condition for which the assumption holds. We demonstrate the convergence properties of the saddle-point alternating direction method of multipliers with numerical examples on a power allocation problem in communication channels and a network routing problem with adversarial costs.Comment: Accepted to 58th Annual Allerton Conference on Communication, Control, and Computin

    Differential Privacy in Cooperative Multiagent Planning

    Full text link
    Privacy-aware multiagent systems must protect agents' sensitive data while simultaneously ensuring that agents accomplish their shared objectives. Towards this goal, we propose a framework to privatize inter-agent communications in cooperative multiagent decision-making problems. We study sequential decision-making problems formulated as cooperative Markov games with reach-avoid objectives. We apply a differential privacy mechanism to privatize agents' communicated symbolic state trajectories, and then we analyze tradeoffs between the strength of privacy and the team's performance. For a given level of privacy, this tradeoff is shown to depend critically upon the total correlation among agents' state-action processes. We synthesize policies that are robust to privacy by reducing the value of the total correlation. Numerical experiments demonstrate that the team's performance under these policies decreases by only 3 percent when comparing private versus non-private implementations of communication. By contrast, the team's performance decreases by roughly 86 percent when using baseline policies that ignore total correlation and only optimize team performance

    Formal Methods for Autonomous Systems

    Full text link
    Formal methods refer to rigorous, mathematical approaches to system development and have played a key role in establishing the correctness of safety-critical systems. The main building blocks of formal methods are models and specifications, which are analogous to behaviors and requirements in system design and give us the means to verify and synthesize system behaviors with formal guarantees. This monograph provides a survey of the current state of the art on applications of formal methods in the autonomous systems domain. We consider correct-by-construction synthesis under various formulations, including closed systems, reactive, and probabilistic settings. Beyond synthesizing systems in known environments, we address the concept of uncertainty and bound the behavior of systems that employ learning using formal methods. Further, we examine the synthesis of systems with monitoring, a mitigation technique for ensuring that once a system deviates from expected behavior, it knows a way of returning to normalcy. We also show how to overcome some limitations of formal methods themselves with learning. We conclude with future directions for formal methods in reinforcement learning, uncertainty, privacy, explainability of formal methods, and regulation and certification

    On the Sample Complexity of Vanilla Model-Based Offline Reinforcement Learning with Dependent Samples

    No full text
    Offline reinforcement learning (offline RL) considers problems where learning is performed using only previously collected samples and is helpful for the settings in which collecting new data is costly or risky. In model-based offline RL, the learner performs estimation (or optimization) using a model constructed according to the empirical transition frequencies. We analyze the sample complexity of vanilla model-based offline RL with dependent samples in the infinite-horizon discounted-reward setting. In our setting, the samples obey the dynamics of the Markov decision process and, consequently, may have interdependencies. Under no assumption of independent samples, we provide a high-probability, polynomial sample complexity bound for vanilla model-based off-policy evaluation that requires partial or uniform coverage. We extend this result to the off-policy optimization under uniform coverage. As a comparison to the model-based approach, we analyze the sample complexity of off-policy evaluation with vanilla importance sampling in the infinite-horizon setting. Finally, we provide an estimator that outperforms the sample-mean estimator for almost deterministic dynamics that are prevalent in reinforcement learning

    Influence of Menstrual Cycle on P Wave Dispersion

    No full text
    Female gender is an independent risk factor for some types of arrhythmias. We sought to determine whether the menstrual cycle affects P wave dispersion, which is a predictor of atrial fibrillation. The study population consisted of 59 women in follicular phase (mean age, 29.3 +/- 7.7 years) (group F) and 53 women in luteal phase (mean age, 28.1 +/- 6.8 years) (group L). The ECGs of 35 patients (mean age, 26.4 +/- 4.5) were obtained in both follicular and luteal phase. Both groups underwent a standard 12-lead surface electrocardiogram recorded at 50 mm/s. Maximal (Pmax) and minimal P wave durations (Pmin) were measured. P wave dispersion (PD) was defined as the difference between Pmax and Pmin. PD was significantly higher in group L than group F (46.6 +/- 18.5 versus 40.1 +/- 12.7; P < 0.05). Pmin was significantly lower in group L than group F (51.6 +/- 12.1 versus 59.1 +/- 12.1; P = 0.002). When we compared ECGs in different phases of the 35 patients, PD was significantly higher in luteal phase than follicular phase (53.2 +/- 12.3 versus 42.8 +/- 10.2; P < 0.05). Pmin was significantly lower in luteal phase than follicular phase (47.6 +/- 6.6 versus 56 +/- 10.1; P = 0.05). We detected a significant correlation between the day of the menses and PD (r = 0.27; P < 0.05). PD was increased in luteal phase compared to follicular phase, and this difference was more prominent as the days of the cycle progressed. (Int Heart J 2011; 52: 23-26

    The Value of P wave dispersion in predicting reperfusion and infarct related artery patency in acute anterior myocardial infarction

    No full text
    Purpose: The aim of this study is to investigate whether P wave dispersion (PWD), measured before, during and after fibrinolytic therapy (FT,) is able to predict successful reperfusion and infarct related artery (IRA) patency in patients with acute anterior MI who received FT
    corecore